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Abstract. 



Within the scope of the Planck IDIS (Integrated Data Information System) 
project we have started to develop the data model for time-ordered data and 
full-sky maps. The data model is part of the Data Management Component 
(DMC), a software system designed according to a three-tier architecture 
which allows complete separation between data storage and processing. The 
DMC is already being used for simulation activities and the modeling of 
some foreground components. We have ingested several Galactic surveys into 
the database and used the science data-access interface to process the data. 
The data structure for full-sky maps utilises the HEALPix tessellation of 
the sphere. We have been able to obtain consistent measures of the angular 
power spectrum of the Galactic radio continuum emission between 408 MHz 
and 2417 MHz. 



1 Introduction 

The ESA satellite Planck will survey the microwave sky with unprecedented 
sensitivity from 30 to 850 GHz. These observations will generate a large data 
set, with inhomogeneous data types, which will need to be accessible to users 
spread across many institutes in Europe and the US. How well it will be 
possible to mine this wealth of information depends critically on the proper 
design of the Data Management Component (DMC); that is on the efficiency 
and user-friendliness of the suite of tools that will be used to store, retrieve, 
process and query the data. 

The IDIS |ij project is a collaboration among the two Planck Data Pro- 
cessing Centres - in particular OAT of Trieste, the Max-Planck Institut fur 
Astrophysik and the Astrophysics Division of ESTEC. Within IDIS, we have 
started to develop the Planck DMC according to a 3-tier architecture. 



2 Giovanna Giardino et al. 



2 A 3-tier design for the Planck DMC 

The two main logical partitions of data management are the data storage sys- 
tem and its tools and the processing system and its tools. The problem with 
this type of partitioning is that processing and storage are not shielded from 
one another: changing storage system implies making changes to some parts 
of the processing system (as illustrated in Fig. |IJ). Moreover anyone concerned 
with the development of the processing techniques has to be aware of storage 
details, such as whether data are stored in files or in a database system. For a 
large project, involving hundreds of people (scientists and software engineers) 
and lasting for over a decade, this kind of approach is inefficient and can lead 
to a waste of resources. 




Fig. 1. In a 2-tier architecture changes to the storage system impact the processing 
system 

In a 3-tier design an additional layer is inserted between the data storage 
and the processing layer. This is the data layer and in our case this is an 
abstract layer. All the processing is done using abstract objects or interfaces. 
Manipulating abstract objects rather than real objects allows the user to 
develop programs which are more closely related to the way one thinks about 
a problem rather than the way a computer operates [0 . The extra tier keeps 
data and processing completely separate from each other: changing the data 
storage system does not have any implications for the processing (Fig. |^). This 
means, for instance, that if technology evolves and a more efficient storage 
system becomes available, the processing pipeline can be switched to access 
the data from the new storage system, without any disruption. In addition, 
thanks to this extra layer, scientists can develop the processing algorithms 
without having to worry about whether the data will be stored in files or in 
databases. 

The concept is very simple, but implementing this design requires extra 
thought and some extra work in the initial phases of the project. However, 
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Fig. 2. In a 3-tier architecture Storage System and Processing System are de- 
coupled from each other: changes to the storage do not affect the processing 
pipelines. One can also envisage different storage systems being used in parallel. 

good design makes all the difference between allowing users (the scientists) 
to efficiently mine the data or having them spending hours debugging code. 

We have started to develop the data access interfaces and the data struc- 
tures for Planck time-ordered data and Planck full-sky maps. The latter uses 
the HEALPix tessellation of the sphere. Implementation of the system is 
done using an object oriented approach and the Java programming language, 
which offers a natural way of implementing abstract classes and interfaces. 
To implement the database we decided to use an object oriented database 
( Objectivity) . The DMC is already being used for Planck simulation activities 
and the analysis of some CMB foregrounds. 

3 The angular power spectrum of Galactic radio 
emission 

We have used this data management component and the database to measure 
the angular power spectrum of Galactic radio emission using some of the 
existing radio surveys. Galactic radio emission is a foreground signal when 
observing the CMB. The knowledge of the power spectra of the foreground 
components is important in order to quantify the level of contamination of 
the CMB observations at the different angular scales. Improved modeling of 
the foregrounds also allows more realistic simulations of the mission to be 
performed. 
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The Haslam survey at 408 MHz @ , the Reich & Reich survey at 1420 MHz 
[Q and the Jonas survey at 2326 MHz || were fed into the database. The 
maps were ingested raw, as they are publicly available. No de-striping process 
was applied to the maps. The point sources were removed by median filtering. 
After point source removal, we could swiftly obtain a measure of the angular 
power spectrum of the diffuse emission at the different frequencies of the 
three maps. 

We model these spectra with a power law of the form C/ oc l~ a (see Gia- 
rdino et al., in preparation - for more details about the analysis performed). 
At high Galactic latitudes (|b| > 20°) we derived a spectral index which is 
consistent for all the 3 surveys and has an average value of a = 3.0±0.2. The 
angular power spectra derived at high galactic latitude are reported in Fig. |^ 
for the Haslam map, in Fig. |] for the Reich & Reich map and in Fig. || for 
the Jonas map. The best-fit power law spectra for each case are also shown. 
The derived spectral indexes are summarized in Table [I]. 

Table 1. Spectral index of the angular power spectrum of diffuse radio emission at 
high Galactic latitude for three available surveys. a a is the standard deviation of 
q. The impact of incomplete sky coverage and the Galactic plane cutoff has been 
evaluated through Monte Carlo simulation. 



Survey 


i/[MHz] 


a 




I range 


Haslam 


408 


2.94 


0.09 


2-70 


Reich & Reich 


1420 


3.15 


0.14 


2-70 


Jonas 


2326 


2.92 


0.07 


2-100 



The same analysis has also been applied to the Parkes polarimetric survey 
of the Galactic Plane at 2417 MHz in order to derive the angular power 
spectrum of the polarised emission. From the analysis of the raw data of the 
polarised component we have obtained a spectral index of a — 1.9±0.3 in the 
multipole range I € [1 00, 500] . After median filtering the spectral index does 
not change significantly in the Grange 100 — 300J, while the spectral index of 
the spectrum of the total intensity does (see Fig.g). This is expected since the 
contribution of point sources to the polarised emission appears to be smaller 
than the point source contribution to the total intensity. Therefore, at 2.4 
GHz and for regions at low Galactic latitudes, the angular power spectrum 
of polarised diffuse emission is significantly flatter than the angular power 
spectrum of total diffuse emission. 



1 I < 300 is the multipole range not affected by the median filter suppression of 
the high spatial-frequency signal 
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Haslam 408 MHz - median filtered 
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Fig. 3. The angular power spectrum of the Haslam survey at 408 MHz, for regions 
of the sky with |b| > 20°. Points sources were removed by median filtering. The 
aliasing noise introduced by the Galactic plane cutoff at 20° dominates over the 
steeply falling signal at I > 100. 



Reich & Reich 1420 MHz - median filtered 
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Fig. 4. The angular power spectrum of the Reich & Reich survey at f 420 MHz, for 
regions of the sky with |6| > 20° (as per Fig. 3) 
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Jonas 2326 MHz - median filtered 
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Fig. 5. The angular power spectrum of the Jonas survey at 2326 MHz, for regions 
of the sky with |6| > 20° (as per Fig. 3). 
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Fig. 6. The angular power spectrum of the Parkes survey of the southern Galactic 
plane at 2417 MHz. The angular power spectrum of the total emission and the 
polarized fraction are shown, before and after median filtering (solid and dashed 
line respectively) 
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4 Conclusions 

This initial version of the Planck DMC allows maps from large sky surveys to 
be handled efficiently and it can be used to perform scientific data analysis. 
From the point of view of the client (scientist) who is interested in data 
processing: 

• the tool with its 3-tier design provides a convenient way to access the 
data. It allows the user to construct and handle "scientific objects" (such 
as maps or spectra) without having to understand the technicalities of 
data storage (formatting, optimization issues) 

• switching from a storage implementation which uses a file system on disk 
(e.g. a set of FITS files) or an object oriented database on a server is 
effortless 

• the use of a data structure such as HEALPix has proved to be a crucial 
factor for the speed of operations involving Spherical Harmonic decom- 
position 

From the point of view of the software engineer who develops the data storage 
system: 

• a database offers a way of handling and managing the data which is more 
powerful than a simple file system 

• from benchmarking tests, the use of an object oriented database in our 
case proved to be preferable to a relational database: for speed of data 
access and the option of storing objects of any given complexity (that is 
objects which refer to other objects, which in turn refer to other objects 
and so forth) 

Using this version of the Planck DMC we have derived the spectral indices 
of radio diffuse emission from the available large-sky surveys. At high Galactic 
latitudes these are consistent with the spectral indices derived previously 
from the Haslam map and the Reich & Reich map ( JtJ ; Q ; ||] ) , but are more 
precise (Giardino et al., in preparation). 
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